Getting Started with Assembly Language Programming
Contents
Getting Started
What Do All Those Instructions Do?
Addressing
32-Bit Mode
Floating Point
Welcome to ASM-101! Here you'll quickly learn to program the IBM-PC in assembly language. It's assumed that you have some programming experience (perhaps even with other assembly languages) and know how to run programs from a DOS prompt. (Windows XP users: Run... "cmd".)
To get started, locate the file ASM101.ZIP. It's in the bonus pack accompanying this article. Make a directory called ASM and unzip the files into it. This provides the example programs and a few other goodies.
You also need the Microsoft assembler, MASM 6.14. A free copy can be downloaded by clicking on this site. Download the zip file "Assembler and the Linker". Note that this version must be run under Windows. (Although not necessary, while you're at it, you might want to download the zip file for the excellent book: "The Microsoft Guide for Assembly Language and C Programmers" also known as: "Advanced MS-DOS Programming").
You also need some kind of (plain ASCII) text editor. If you don't have a favorite, EDIT, the one supplied with DOS (and Windows) will do nicely.
Start
Our first example does nothing ... but it's a start. ;)
cseg segment
assume cs:cseg, ds:cseg, ss:cseg, es:cseg
org 100h
start:
ret
cseg ends
end start
Type "ASM START" [Enter]. This runs a batch file that runs the assembler and creates an executable file called START.COM.
Go ahead and execute it by typing "START". (Windows XP users must type "START.COM" to prevent running the XP command "start".) See, it does nothing. However it is useful. It provides the peculiar setup that the Microsoft assembler (MASM) needs, and it serves as the starting point for all the programs we'll write.
2plus3
Now let's make START.ASM do something. We'll insert a few lines and create the file called 2PLUS3.ASM. This is already done for you, so you don't need to type it.
cseg segment
assume cs:cseg, ds:cseg, ss:cseg, es:cseg
org 100h
start:
MOV AL, 2
ADD AL, 3
OR AL, 30H ;convert digit to ASCII
MOV DL, AL ;DOS wants the char in the DL register
MOV AH, 02H ;select DOS function to display a char
INT 21H ;call DOS routine
ret
cseg ends
end start
The newly inserted lines
MOV AL, 2
ADD AL, 3
show how to do calculations in assembly language. (We're really getting somewhere!) The value "2" is MOVed into the AL register. Then the value "3" is ADDed to it. The "AL register" is simply one of several temporary storage locations in the heart of the processor.
Intel processors use the convention of moving data from right to left instead of left to right, as is done by many other processors. The source is on the right and the destination is on the left. If this seems backwards, consider the direction used in most high level languages, for example: A = 2 + 3.
Note that either upper or lower case characters can be used.
Great! We can do calculations, but they're useless unless we can see the answer.
The next line
OR AL, 30H ;convert digit to ASCII
converts the binary digit in the AL register into an ASCII value that can be displayed.
The OR instruction (like in most processors) does a bitwise OR operation. Here it converts the value 5 into 35 hex (base 16 = 53 base 10), which is the ASCII code for the character "5".
Note that hexadecimal numbers are represented with an "H". The line also contains a comment. Any characters that follow a semicolon (;) are ignored by the assembler (up until the carriage return and linefeed at the end of the line).
This is just one of many ways to convert a digit to ASCII. We could also use
ADD AL, 48
but this obscures the fact that we're setting bits 4 and 5.
Here's a picture of the operation. The AL register is 8 bits wide. The least significant bit is shown at the right.
Output
Now that we have an ASCII character, we can output it. There are many ways that this can be done, but normally DOS (Disk Operating System) or BIOS (Basic Input/Output System) subroutine calls are used. Here's how to do it with a DOS call:
MOV DL, AL ;DOS wants the char in the DL register
MOV AH, 02H ;select DOS function to display a char
INT 21H ;call DOS routine
DOS and BIOS calls are done using the INT instruction. INT stands for "interrupt", which is a little misleading. Normally "interrupt" refers to a hardware signal that interrupts what the processor is doing and makes it jump to a routine called an interrupt handler. When software uses the INT instruction, it's essentially the same as calling a subroutine.
DOS has been assigned the 34th interrupt vector (= 21H; count it out). There are 256 interrupt vectors. Some are used by hardware - such as divide-by-zero - some are used by software, but many are unused. The AH register is used to specify the desired DOS routine. In our example, we're calling the character output routine, which is "function" number 2.
Here's another way of displaying the character, this time using a BIOS call:
MOV BH, 0
MOV AH, 0EH
INT 10H
In addition to the ASCII value in the AL register, this call requires two other pieces of information. The AH register specifies the function that will be executed by the INT 10H call. The BH register must be set to zero. This selects page 0, which is the chunk of video memory that is normally visible. (Other pages can be written to then later made visible, but this feature is rarely used.)
Here's a nice, simple way of displaying what's in the AL register:
INT 29H
This was once an undocumented feature exploited by Microsoft, but it has since become standard usage.
Yet another way to display a character is to write it directly into video memory:
MOV BX, 0B800H
MOV ES, BX
MOV ES:[6], AL
This writes the character into the 4th position on the screen. There are two bytes per screen position. The first byte has an offset of zero. Thus the bytes corresponding to character positions are numbered: 0, 2, 4, 6.... A byte index of 6 is the 4th character position. The odd numbered bytes are used to set the color of a character.
If you execute this code, be aware that the "5" may scroll off the top of the screen before you can see it. If this happens, use the DOS command CLS (clear screen) before executing the code.
In the example the extra segment (ES) register is used to point to the base of video text memory, at physical address B8000h. Take a close look at those 0's. The leading 0 in 0B800H is how the assembler distinguishes a number from a name. (Numbers start with 0-9; names start with A-Z, or a-z, or a few other characters such as the underline.) The missing 0 at the end of 0B800H is explained below.
Segment addressing was a major feature that distinguished the 8088 processor used in the original IBM-PC from other processors used at the time, such as the Z80 or 6502. This enabled it to address up to 1 megabyte of memory rather than being limited to 64 kilobytes. This ability, however, came at the price of some confusion.
All references to memory addresses in the PC involve the use of segment registers. Usually these registers are implied rather than being explicitly stated, as in the above example with the ES register. All instructions are fetched relative to the code segment (CS) register; all data are read and stored relative to the data segment (DS) register; and all subroutine return addresses are stored relative to the stack segment (SS) register.
The actual physical address used by the processor is determined by the value in a segment register multiplied by 16 then added to an offset address. In the example above, the video memory address used is B800h * 16 + 6. This mixture of hex and decimal can be rewritten as B800h * 10h + 6, which equals B8006h.
Don't be intimidated by segment addressing. It's only when programs and data approach 64K that you need to give it much thought.
Hello World!
Here's the traditional "Hello World!" program. This displays the message using DOS function 9.
MOV DX, OFFSET MSG
MOV AH, 09H
INT 21H
ret
MSG DB 'Hello World!$'
The word OFFSET is used to specify that the address of the message (MSG) is to be moved into DX, rather than the contents of the MSG location.
DB is not an instruction executed by the processor, but rather it's a command to the assembler to "Define Bytes" of memory. Characters enclosed in single quotes are stored into sequential bytes. The dollar sign ($) is the DOS convention for terminating a message. (A much better convention, used almost everywhere else, is to terminate strings with a zero byte.)
Links
Ralf Brown's Interrupt List contains essential information about DOS and BIOS functions:
http://www.ctyme.com/rbrown.htm
If you want a free book that explains assembly language programming in detail, it's at:
http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/0_ArtofAsm.html
Here's hard core information on the Pentium from the source:
http://www.intel.com/design/intarch/Techinfo/Pentium/index.htm
You can download a free copy of the assembler (6.15) and linker (5.60) directly from Microsoft. This website explains how:
http://users.easystreet.com/jkirwan/pctools.html